Self-Organizing Maps of Words for Natural Language Processing Applications
نویسنده
چکیده
Kohonen’s Self-Organizing Map (SOM) is a general unsupervised tool for ordering highdimensional statistical data so that neighboring nodes on the map represent similar inputs. Often the SOM is applied to numerical data in application areas like pattern recognition, signal processing, and multivariate statistical analysis. The SOM can also be used to find statistical similarities between symbols if suitable contextual information is available as there indeed is for words in natural language texts. Using the short contexts of each word occurring in a text the SOM algorithm is able to organize the words into grammatical and semantic categories represented on a two-dimensional array. The similarity of the categories is reflected in their distance relationships on the array. This kind of a word category map may then be utilized in applications such as the analysis of large document collections. In addition to the description of information retrieval and textual data mining applications, this paper outlines the relation of word category maps to the symbolic knowledge representation formalisms. The graded nature of the categorization performed by the SOM is discussed. The aim is also to provide an overview on the research results and on the potential new areas.
منابع مشابه
Self-organizing Maps in Natural Language Processing
Kohonen's Self-Organizing Map (SOM) is one of the most popular arti cial neural network algorithms. Word category maps are SOMs that have been organized according to word similarities, measured by the similarity of the short contexts of the words. Conceptually interrelated words tend to fall into the same or neighboring map nodes. Nodes may thus be viewed as word categories. Although no a prior...
متن کاملInternational Conference on Arti cial Neural Networks , ICANN
Semantic roles of words in natural languages are reeected by the contexts in which they occur. These roles can explicitly be visualized by the Self-Organizing Map (SOM). In the experiments reported in this work the source data consisted of the raw text of Grimm fairy tales without any prior syntactic or semantic categorization of the words. The algorithm was able to create diagrams that seem to...
متن کاملContextual self-organizing map: software for constructing semantic representations.
In this article, we introduce a software package that applies a corpus-based algorithm to derive semantic representations of words. The algorithm relies on analyses of contextual information extracted from a text corpus--specifically, analyses of word co-occurrences in a large-scale electronic database of text. Here, a target word is represented as the combination of the average of all words pr...
متن کاملUsing NLP to Efficiently Visualize Text Collections with SOMs
Self-Organizing Maps (SOMs) are a good method to cluster and visualize large collections of text documents, but they are computationally expensive. In this paper, we investigate ways to use natural language parsing of the texts to remove unimportant terms from the usual bag-of-words representation, to improve efficiency. We find that reducing the document representation to just the heads of nou...
متن کاملLearning to Understand - General Aspects of Using Self-Organizing Maps in Natural Language Processing
The Self-Organizing Map (SOM) is an artificial neural network model based on unsupervised learning. In this paper, the use of the SOM in natural language processing is considered. The main emphasis is on natural features of natural language including contextuality of interpretation, and the communicative and social aspects of natural language learning and usage. The SOM is introduced as a gener...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997